Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
New words detection method for microblog text based on integrating of rules and statistics
ZHOU Shuangshuang, XU Jin'an, CHEN Yufeng, ZHANG Yujie
Journal of Computer Applications    2017, 37 (4): 1044-1050.   DOI: 10.11772/j.issn.1001-9081.2017.04.1044
Abstract446)      PDF (1117KB)(611)       Save
The formation rules of microblog new words are extremely complex with high degree of dispersion, and the extracted results by using traditional C/NC-value method have several problems, including relatively low accuracy of the boundary of identified new words and low detection accuracy of new words with low frequency. To solve these problems, a method of integrating heuristic rules, modified C/NC-value method and Conditional Random Field (CRF) model was proposed. On one hand, heuristic rules included the abstracted information of classification and inductive rules focusing on the components of microblog new words. The rules were artificially summarized by using Part Of Speech (POS), character types and symbols through observing a large number of microblog documents. On the other hand, to improve the accuracy of the boundary of identified new words and the detection accuracy of new words with low frequency, traditional C/NC-value method was modified by merging the information of word frequency, branch entropy, mutual information and other statistical features to reconstruct the objective function. Finally, CRF model was used to train and detect new words. The experimental results show that the F value of the proposed method in new words detection is improved effectively.
Reference | Related Articles | Metrics